Personal Name Resolution of Web People Search

نویسندگان

  • Krisztian Balog
  • Leif Azzopardi
  • Maarten de Rijke
چکیده

Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a difficult and challenging task. In this paper, we explore the extent to which the “cluster hypothesis” for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (single pass clustering) or (2) semantic based matching (Probabilistic Latent Semantic Analysis). We compare and contrast these strategies and provide strong evidence to suggest that the hypothesis holds for the former. And in fact, on the new evaluation platform of the SemEval 2007 Web People Search task, we show that using single pass clustering with a standard IR document representations fits well with the assumptions about the data and the task which yields state-of-the-art performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching for people on Web search engines

The Web is a communication and information technology that is often used for the distribution and retrieval of personal information. Many people and organizations mount Web sites containing large amounts of information on individuals, particularly about celebrities. However, limited studies have examined how people search for information on other people, using personal names, via Web search eng...

متن کامل

Disambiguating Personal Names on the Web Using Automatically Extracted Key Phrases

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces unique phrases to disambiguate different people with the same name (i.e. namesakes). Our algorithm ta...

متن کامل

CWePS: Chinese Web People Search

Name ambiguity is a big problem in personal information retrieval, especially given the explosive growth of Web data. In this demonstration, we present a prototype Chinese Web People Search system, called CWePS. Given a personal name as query, CWePS collects the top results from the existing search engines, and groups these returned pages into several clusters. Ideally, the Webpages in the same...

متن کامل

Improving the performance of personal name disambiguation using web directories

Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result s...

متن کامل

Extracting Key Phrases to Disambiguate Personal Names on the Web

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further n...

متن کامل

Automatic Annotation of Ambiguous Personal Names on the Web

Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document co-reference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the web using automatically extracted keywords. Given an ambiguous personal name, first, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008